Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at IAWG pass at pack/unpack chapter #449

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dsolt
Copy link
Contributor

@dsolt dsolt commented Apr 24, 2023

Signed-off-by: David Solt [email protected]

@dsolt
Copy link
Contributor Author

dsolt commented Apr 24, 2023

Outstanding issues not addressed yet in the PR:

  • Use of calloc/free in macros and in PMIx_data_load, No means to free the data allocated by PMIx_Data_compress and decompress
  • “These errors cannot be detected during packing” - not sure what to do with this.
  • Text talks about resetting the unpack_ptr, but there is no macro to do that.
  • If more values exist in the buffer than can fit into the memory storage, then the function will unpack what it can fit into that location and return an error code indicating that the buffer was only partially unpacked.
  • Why PMIx_data_compress does not take a byte object
  • data_embed and data_load names could be improved. (copy and move?)
  • should the whole chapter be moved later in the document since its not really core pmix functionality

@dsolt dsolt removed the WorkInProgress Work In Progress label Mar 25, 2024
@dsolt
Copy link
Contributor Author

dsolt commented Apr 11, 2024

Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote - that is also useful data
  • If you've already commented on this issue, please still vote so
    we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end
    up accepting proposals that appear to have some overlap
    This is not a binding majority-rule vote, but it will be a very
    significant input into the corresponding ASC decision.

Here are the meanings for the emojis:

  • Hooray or Rocket: I support this so strongly that I
    want to be an advocate for it
  • Heart: I think this is an ideal solution
  • Thumbs up: I'd be happy with this solution
  • Confused: I'd rather we not do this, but I can tolerate it
  • Thumbs down: I'd be actively unhappy, and may even consider
    other technologies instead
    If you want to explain in more detail, feel free to add another
    comment, but please also vote on this comment.

@@ -199,14 +199,16 @@ \subsection{\code{PMIx_Data_pack}}

The pack function packs one or more values of a specified type into the specified buffer. The buffer must have already been
initialized via the \refmacro{PMIX_DATA_BUFFER_CREATE} or \refmacro{PMIX_DATA_BUFFER_CONSTRUCT}
macros --- otherwise, \refapi{PMIx_Data_pack} will return an error.
macros --- otherwise, \refapi{PMIx_Data_pack} will return an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errr...that may not be true as there is no way for us to detect whether or not the buffer struct was initialized. It could just contain garbage.

Note that any data to be packed that is not hard type cast (i.e.,
not type cast to a specific size) may lose precision when unpacked
Note that packing data using a type that
does not explicitly specifies its size may
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifies should be specify

@@ -243,11 +245,12 @@ \subsection{\code{PMIx_Data_unpack}}


\begin{arglist}
\argin{source}{Pointer to a \refstruct{pmix_proc_t} structure containing the nspace/rank of the process that packed the provided buffer. A NULL value may be used to indicate that the source is based on the same \ac{PMIx} version as the caller. Note that only the source's nspace is relevant. (handle)}
\argin{source}{Pointer to a \refstruct{pmix_proc_t} structure containing the description of the process that packed the provided buffer. A NULL value may be used to indicate that the source is based on the same \ac{PMIx} version as the caller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little confusing terminology - elsewhere, we refer to a pmix_proc_t as being the "process identifier", not a "description of the process". A description of the process sounds like a much larger set of information - e.g., where it is located, what executable is involved, etc.


Note that it is possible for the buffer to be corrupted and that \ac{PMIx} will \textit{think} there is a proper variable type at the beginning of an unpack region --- but that the value is bogus (e.g., just a byte field in a string array that so happens to have a value that matches the specified data type flag). Therefore, the data type error check is \textit{not} completely safe.
The unpack function unpacks the next value (or values) from the given buffer. Providing an unsupported type flag
or specifying a data type that \textit{does not} match the type of the next item in the buffer will be reported as an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is only true if the library was told to use fully described buffers. Otherwise, we have no idea what the actual type of the data is - we just unpack the bits and put them in the data object of the type you claimed they should be.

Note that it is possible for the buffer to be corrupted and that \ac{PMIx} will \textit{think} there is a proper variable type at the beginning of an unpack region --- but that the value is bogus (e.g., just a byte field in a string array that so happens to have a value that matches the specified data type flag). Therefore, the data type error check is \textit{not} completely safe.
The unpack function unpacks the next value (or values) from the given buffer. Providing an unsupported type flag
or specifying a data type that \textit{does not} match the type of the next item in the buffer will be reported as an error.
An attempt to read an uninitialized buffer or read beyond the end of the stored data held in the buffer will also return an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the uninitialized buffer clause isn't really correct - we have no idea if the buffer was initialized or just contains random garbage.


Unpacking values is a "nondestructive" process --- i.e., the values are not removed from the buffer. It is therefore possible for the caller to re-unpack a value from the same buffer by resetting the unpack_ptr.

Warning: The caller is responsible for providing adequate memory storage for the requested data. The user must provide a parameter indicating the maximum number of values that can be unpacked into the allocated memory. If more values exist in the buffer than can fit into the memory storage, then the function will unpack what it can fit into that location and return an error code indicating that the buffer was only partially unpacked.

Note that any data that was not hard type cast (i.e., not type cast to a specific size) when packed may lose precision when unpacked by a non-homogeneous recipient. \ac{PMIx} will do its best to deal with heterogeneity issues between the packer and unpacker in such cases. Sending a number larger than can be handled by the recipient will return an error code generated upon unpacking --- these errors cannot be detected during packing.
Note that any data that is packed using a type that does not explicitly specifies its size may lose precision when unpacked by a non-homogeneous recipient. \ac{PMIx} will do its best to deal with heterogeneity issues between the packer and unpacker in such cases. Sending a number larger than can be handled by the recipient will return an error code generated upon unpacking --- these errors cannot be detected during packing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe you modified this above to replace "number" with "value"?

@abouteiller abouteiller added the Eligible Eligible for consideration by ASC label Apr 12, 2024
@abouteiller abouteiller added this to the PMIx v5.1 Standard milestone Apr 12, 2024
@@ -4,7 +4,7 @@
\chapter{Data Packing and Unpacking}
\label{chap:api_data_mgmt}

\ac{PMIx} intentionally does not include support for internode communications in the standard, instead relying on its host \ac{SMS} environment to transfer any needed data and/or requests between nodes. These operations frequently involve PMIx-defined public data structures that include binary data. Many \ac{HPC} clusters are homogeneous, and so transferring the structures can be done rather simply. However, greater effort is required in heterogeneous environments to ensure binary data is correctly transferred. \ac{PMIx} buffer manipulation functions are provided for this purpose via standardized interfaces to ease adoption.
\ac{PMIx} intentionally does not include support for internode communications in the standard, instead relying on its host \ac{SMS} environment to transfer any needed data and/or requests between nodes. However, to support \ac{SMS} environments which must frequently transfer \ac{PMIx} data structures between nodes and client applications that need to store or transfer \ac{PMIx} data structures, \ac{PMIx} provides the \acs{API} presented in this chapter. These operations frequently involve PMIx-defined data structures that include data that may have different binary representations on different hosts. Many \ac{HPC} clusters are homogeneous, and so transferring the structures can be done rather simply. However, greater effort is required in heterogeneous environments to ensure binary data is correctly transferred. \ac{PMIx} buffer manipulation functions are provided for this purpose via standardized interfaces to ease adoption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to discourage use by client code by saying more about how this is really intended only for communicating structures across server nodes.

Chap_API_Data_Mgmt.tex Show resolved Hide resolved
Chap_API_Data_Mgmt.tex Show resolved Hide resolved
Providing an unsupported type flag will likewise be reported as an error.

Note that any data to be packed that is not hard type cast (i.e.,
not type cast to a specific size) may lose precision when unpacked
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double may on line 207/208. Specifies -> specify (see Ralph's comment)

Copy link
Contributor Author

@dsolt dsolt Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also changed on line approx 275:

"Note that any data that is packed using a type that does not explicitly specify its size may lose precision when unpacked by a non-homogeneous recipient. \ac{PMIx} will do its best to deal with heterogeneity issues between the packer and unpacker in such cases. Sending a number larger than can be handled by the recipient will return an error code generated upon unpacking --- these errors cannot be detected during packing."

Chap_API_Data_Mgmt.tex Show resolved Hide resolved
Chap_API_Data_Mgmt.tex Outdated Show resolved Hide resolved
Chap_API_Data_Mgmt.tex Outdated Show resolved Hide resolved
Chap_API_Data_Mgmt.tex Show resolved Hide resolved
Chap_API_Data_Mgmt.tex Outdated Show resolved Hide resolved
Chap_API_Data_Mgmt.tex Show resolved Hide resolved
@abouteiller abouteiller added Major Text Change Correction RFC Request for Comment and removed Eligible Eligible for consideration by ASC labels May 10, 2024
@dsolt
Copy link
Contributor Author

dsolt commented Jul 4, 2024

Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote - that is also useful data
  • If you've already commented on this issue, please still vote so
    we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end
    up accepting proposals that appear to have some overlap
    This is not a binding majority-rule vote, but it will be a very
    significant input into the corresponding ASC decision.

Here are the meanings for the emojis:

  • Hooray or Rocket: I support this so strongly that I
    want to be an advocate for it
  • Heart: I think this is an ideal solution
  • Thumbs up: I'd be happy with this solution
  • Confused: I'd rather we not do this, but I can tolerate it
  • Thumbs down: I'd be actively unhappy, and may even consider
    other technologies instead
    If you want to explain in more detail, feel free to add another
    comment, but please also vote on this comment.

@abouteiller abouteiller added the Eligible Eligible for consideration by ASC label Jul 11, 2024
@naughtont3 naughtont3 added the First Vote Passed ASC first vote passed label Jul 18, 2024
@naughtont3
Copy link
Contributor

PR449 passed 1st vote (w/o changes) at ASC 2024-Q3 meeting.

@abouteiller abouteiller added Accepted as Provisional ASC vote passed. Accepted as Provisional! and removed Eligible Eligible for consideration by ASC RFC Request for Comment labels Oct 17, 2024
@abouteiller
Copy link
Contributor

Passed 2nd vote at ASC 24Q4 so it is now accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted as Provisional ASC vote passed. Accepted as Provisional! Clarification Correction First Vote Passed ASC first vote passed Major Text Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants